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Abstract 

We consider the distributed source coding problem in which correlated data picked up by scattered sensors has to 
be encoded separately and transmitted to a common receiver, subject to a rate-distortion constraint. Although near-to- 
optimal solutions based on Turbo and LDPC codes exist for this problem, in most cases the proposed techniques do 
not scale to networks of hundreds of sensors. We present a scalable solution based on the following key elements: (a) 
distortion-optimized index assignments for low-complexity distributed quantization, (b) source-optimized hierarchical 
clustering based on the Kullback-Leibler distance and (c) sum-product decoding on specific factor graphs exploiting 
the correlation of the data. 

Index Terms 

istributed source coding, hierarchical clustering, quantizer design, source and correlation modelsistributed source 
coding, hierarchical clustering, quantizer design, source and correlation modelsD 

I. Introduction 

In distributed sensing scenarios, where correlated data has to be gathered by a large number of low-complexity, power- 
restricted sensors, efficient source coding and data gathering techniques are key towards reducing the required number 
of transmissions and enabling extended network life-time. Inspired by the seminal work of Slepian and Wolf [24], 
characterizing the fundamental limits of separate encoding of correlated sources, several authors have contributed 
with distributed source coding solutions (see e.g. [28] and references therein). Focusing on scalar quantization, Flynn 
and Gray [10] provided one of the first practical approaches to construct distributed source codes for two continuous- 
valued sources. The basic idea behind this approach — which will also play an important role in our work — is to reuse 
the indices of a high-resolution quantizer such that the overall end-to-end distortion after joint decoding is minimized. 
Pradhan and Ramchandran presented in [19] a method called distributed source coding using syndromes (DISCUS), 
based on channel codes with good distance properties, where the set of possible codewords is partitioned into co-sets 
and only the co-set's syndrome and not the actual codeword is transmitted to the decoder. This method, originally 
considered for an asymmetric scenario where information about one source is available as side information at the 
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decoder, was recently extended to the symmetric case [20] where all sources are to be encoded and side information 
is not available at the decoder. An alternative approach for the asymmetric scenario was provided by Zamir et al. [29] 
and by Servetto in [23] where a constructive approach for Gaussian sources based on linear codes and nested lattices 
was presented. Cardinal and Van Assche [4] as well as Rebollo-Monedero et al. [21] focused on the optimization 
of the quantization stage and proposed design algorithms for multiterminal quantizers. A novel design concept for 
distributed source coding was presented in [16] where basic tools from fundamental number theory, specifically 
Diophantine analysis, are used to construct index assignments capable of exploiting statistical properties common to 
many important source models. Beyond these contributions, highly evolved iterative channel coding techniques such 
as low density parity check (LDPC) and turbo codes have been applied to the distributed source coding problem [28], 
reaching the fundamental limits of Slepian and Wolf [24]. 

Despite these important contributions, very little is known on how to perform distributed compression in large- 
scale sensor networks (i.e. with hundreds of sensor nodes). The main reason for this is that most approaches become 
infeasible when the complexity of joint decoding or the complexity of a joint design of separate encoders is considered 
for a large number of correlated sources. Previous work towards this goal produced a scalable solution for the 
decoding side by running the sum-product algorithm on a carefully chosen factor graph approximation of the source 
correlation [2]. In this paper, we present a scalable solution which includes the encoding side. The main idea is to 
reduce the number of quantization bits in a systematic way, exploiting correlation preserving clusters, which minimize 
the Kullback-Leibler Distance (KLD) between the given source statistics and a factor graph approximation. Our main 
contributions are as follows: 

• Design of Low-Complexity Distributed Source Codes: We propose a methodology to design quantizers for a very 
large number of sensors (> 100) which exploits the spatial correlation between sensor measurements. Inspired 
by [10] we formulate a generalized index-reuse optimization algorithm which allows us to reduce the number 
of bits for data transmission by adding to our system a coarse quantization stage. 

• Source-Optimized Clustering: We devise a hierarchical clustering algorithm that uses the joint probability density 
function (PDF) of the sensor measurements to partition the set of all sensors into clusters and prove that the 
complexity of quantizer design can be reduced significantly. 

• Combination with Factor Graph Decoding: We show how source-optimized clusters used for distributed source 
coding can be incorporated in a KLD optimized factor graph which, in turn, is used at the decoder to exploit 
source correlations in a computationally tractable way. 

• Simulation Results: We show how our techniques can be applied to general sensor network scenarios as well as 
the so-called CEO problem [3] and provide numerical results for setups with 100 encoders. 

The rest of the paper is organized as follows. In Section [n] we give a precise formulation of the problem setup 
and describe the underlying system model. In Section [Hi] we present a technique to optimize quantizers exploiting 
correlations in the source observations. Section ITVl describes our scalable solution based on source-optimized hierar- 
chical clustering in sensor networks. The results of numerical experiments are discussed in Section [V] The paper is 
concluded in Section [VTl 



II. System Setup 

We start by introducing our notation. Random variables are always denoted by capital letters, e.g. U, where its 
realizations are denoted by the corresponding lowercase letters, e.g. u. Vectors are denoted by bold letters and, if 
not stated differently, assumed to be column vectors, e.g. u = (wi, U2, . . . , un) t and U = (Ui, U2, ■■■ , Un) T ■ The 
expression Ojv = (0,0, ...,0) T is the lengfh-iV zero vector. Matrices are denoted by bold capital letters, e.g. A, 
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Fig. 1, System model. N correlated sources are encoded independently and decoded jointly. At each encoder e n , n £ {1, 2, , . , , N}, the 
observed source symbol u n is encoded onto the codeword w n and communicated to the joint decoder <E> at rate R n . In the first stage of encoding, 
the discrete source index i n is obtained from u n by the scalar quantizer q n and, subsequently, w n is obtained by the index assignment m n 
such that e n = m n o q n . After perfect transmission the joint decoder uses the vector of received codewords w = [wi , W2, • • • , wn) t and its 
knowledge about the source statistics p(u\, U2, . . . , Un) to jointly form the estimates u = (ui, U2, . . . , un) T ■ 



where its determinant is referred to by the usage of vertical bars, e.g. |A|. The expression I at is the N x N identity 
matrix. It is always clear from the context, or stated explicitly, if a bold capital letter refers to a vector of random 
variables or to a matrix. Index sets are denoted by capital calligraphic letters, e.g. M, unless otherwise noted, where 
the set's cardinality is referred to by the usage of vertical bars, e.g. \M\. We follow the convention that variables 
indexed by a set denote a set of variables, e.g. if M = {1, 2, 3} then w/v" — U2, ^3}, and use the same concept to 
define vectors of variables, e.g. u>/ = (ui, «2, U' i ) T . Furthermore, the entries of a vector are referred to by specifying 
its index within paretheses, e.g. u(0) refers to the first and u(iV — 1) to the last entry of the length- iV vector u. 

The covariance is defined by Cov{a, b} = E{ah T } — E{a}E{h} T , where E{-} is the expectation operator. 

An iV-dimensional random variable with realizations u = (ui u%,--- ,un) t £ is Gaussian distributed with 
mean fi = E{u} and covariance matrix £ = Cov{u, u} when p(u) is given by 

p(u) = exp(-i(u - m) T S -1 (u - /z))/((2 7 r) JV |S|) 1 / 2 . (1) 

Such a PDF is simply denoted as jV(/i,E). 



A. System Model 

We consider a setup of iV independently operating sensors. In this setup each sensor indexed by n G M, M = 
{1,2, ••• , N}, observes a continuous-valued source sample u n (t) at time instant t. For simplicity, only spatial 
correlations between measurements and not their temporal dependence is considered such that the time index t is 
dropped and only one time instant is considered. The vector of source samples u = (ui,U2, ■ ■ ■ ,un) t , u G M. N , 
at each time instant t is assumed to be one realization of a iV-dimensional Gaussian random variable distributed 
according to jV(Ojv,R) with the vector of mean values fi = On and the covariance matrix E set equal to the 
correlation matrix 

1 Pi, 2 ' • ■ Pl,N 
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such that the individual source samples u n , n G TV, have zero mean E{u n } — 0, unit variance Cov{u n ,u n } — 1 and 
are correlated with Um, rn 7^ n, m G J\f, according to the correlation coefficient p n ,m = Cov{u„, u m }. Gaussian 
models for capturing the spatial correlation between sensors at different locations are discussed in [22] and models 
for the correlation coefficients of physical processes unfolding in a field can be found in [8]. 

We assume that the sensors are low-complexity devices consisting only of a scalar quantizer followed by an index 
assignment stage, see Figure Q] Specifically, we consider the following encoding procedure for each sensor n G M: 

In the first step, the observed source samples u n G R are mapped onto quantization indices i„ G 1 n , l n — 
{0, 1, . . . , \1 n \ — 1}, by the quantization function q n : R — > I n such that i n — q n (u„). During quantization, an 
input value u n is mapped onto the index i n if it falls into the interval B n (i n ) C R between the decision levels 
b„(i n ) and b n {i n + 1) such that b n (i n ) < u n < b n (i n + 1), see Figure [2] The obtained quantization index i n is 
then associated with the reconstruction level u n ,i n G U„, U n = {Un,o, U n ,i, • ■ • , u n i;r n |-i}> representing all source 
samples u„ falling into the quantization region B n (i n ). We consider PDF optimized quantizers such that the mean 
squared error (MSE) E{\\U n — U n \\ 2 } = E{(U n — Un) 2 } = JJ° = _ tx> (Mn — u n>qn ^ Uri )) 2 ■ p(u n ) du n within the 
observations is minimized, see e.g. [13]), which implies that the reconstruction levels chosen to be the 

centroid (conditional expected value) of the quantization region B„(i n ), i.e. u n ,i n = E{U„\i n } for all i„ G T n . 

In the second step of encoding, the obtained quantization index i n G T n is mapped onto the codeword w n G W n , 
W n = {0, 1, • . • , |VV n | — 1}, by the mapping function, also called the index assignment, m n : T n — * VV„ such that 
w n = rn n (i n ). We define the mapping function to be surjective, i.e. for any w n G Wn there exists at least one 
i n G l n such that w n = m n (i n ), for n = 1, 2, N. This property shall be important later on. 

In summary, the encoder of each sensor operates in a sequential way and the overall encoding function can be 
expressed as e n = m n o q n such that w n — e n (u n ) — m n (q n (u n )). The data rate at which the codewords w n are 
transmitted to the decoder is defined as R n = [log 2 (|W n |)] [bit]. 

Assuming data transmission over an array of N ideal channels, the decoder uses vector of codewords w = 
(twi, u>2, ■ ■ ■ , wn) T G W, W = rin=i Wn, and available knowledge of the source correlation R to form the 
estimate u = (ui, u%, ■ ■ ■ , un) T , u G R , of the originally observed source samples u G M. N . The decoding 
function is defined as $ : W — > R^ such that u = $(w). Assuming that the MSE E{\ |U — U| | 2 } between 
the estimates U = (Ui, U2, ■ ■ ■ , Un) t and source samples U = (Ui, U2, ■ ■ ■ , Un) t is the fidelity criterion to be 
minimized by the decoder, we observe that 

JV JV 

£{||U-U|| 2 } = E{(U - U) T • (U - U)} = E{J2(Un - U n ) 2 } = J2 E i(Un - Unf}, (2) 

n— 1 n—l 

which shows us that i5{||U — U|| 2 } can be minimized globally by local minimization of the terms E{(U„ — U n ) 2 } 
for n = 1, 2, . .. , N. The optimal estimate u n (w) for a given codeword vector w, i.e. such that E{(U n — Un) 2 } is 





- B n {i n ) - 
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b n (i„ - 1) b n (i n ) b rl {i„ + 1) 

Fig. 2. Scalar quantization. The source samples u n £ K are mapped onto the index i n G X n if they fall into the quantization region B n (i 
such that b n {in) < u n < b n (i n + 1). Those samples, i.e. all u n G Bn(in), are then represented by their reconstruction level u n .i n £ U n 
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minimized globally, can be obtained by conditional mean estimation (CME), see e.g. [18], such that 

Mn(w) = E{U n | W} = S2 E{Un\in} ■ p(in\w) = Un,i n • p(in | w) (3) 

i„=0 in=0 

where equality (a), as derived in Appendix [J allows us to express the estimate u n (w) as a function of E{U n \i n } 
and, thus, as a function of the reconstruction levels assuming that U n ,i„ = £{[/ n |in} as considered in (b). 
The required posterior probabilities p(i n = l\w) can be derived by 

p(i„ = Z|w) = 7 • p(i n = Z, w) = 7 ■ ^2 p(w, i), (4) 

ViSX:i„=i 

where the Bayes rule was applied in (a) using the constant 7 = l/p(w) for normalizing the sum over all probabilities 
to one and in (b) we calculate p(i„ = Z,w) from p(w, i) by marginalizing over all possible realizations of i = 
(ii , 12, ■ ■ ■ , ijv) T , i 6 I, 1 = ]~[ VngA( rX n . It is possible to express p(w, i) in terms of the probability p(i) known 
apriori from the source statistics and the transition probabilities p(w n \i n ) known from the index assignments m n for 
n = 1,2, ...,iV such that 

p(w, i) = p(w|i) • p(i) ( = p(i) • ]^[ p(ty n |«n), (5) 

VneJV 

where the Bayes rule was applied in (a) and (b) takes into account that the index assignment operation performed at 
each encoder is independent from the other encoders. The probability mass function (PMF) p(i) of the index vectors 
i can be obtained by numerically integrating the source PDF p(u) over the quantization region defined by S„(i n ) 
for all encoders n — 1,2, . . . , N . Alternatively, one can resort to Monte Carlo simulation or approximate p(i) by 
other means. Considering implementation issues it is worth pointing out that the transition probabilities p(w n \i n ) are 
either zero or unity since the mapping m n from the indices i n to the codewords w n is a function (i.e. knowledge 
of the index i n implies knowledge of the codeword w n ). Thus, the product of the transition probabilities in l[5} is 
also zero or unity, a fact, which can be exploited for an efficient implementation of the marginalization as shown in 
Appendix |n] 

The complexity of optimal decoding is analyzed in Appendix IIIII and using the derived result we are able to state 
that the computational complexity of calculating all estimates according to l[3} is of 0(NF N ) where F < L — K + 1 
is a system specific parameter depending on the characteristics of the mapping functionsQ 



B. Our Goals 

Under the system model described above, our first goal is to find distributed source coding algorithms that, by joint 
design of the index assignments, offers a suitable solution for large numbers of encoders. Inspired by the work in 
[10], we formulate a generalized index-reuse algorithm to construct, for subsets of encoders, distortion-optimized 
index assignments suitable for distributed source coding. 

Since optimal decoding according to is not feasible for large number of sources, part of this work shall be 
devoted to sub-optimal, yet feasible, decoders based on the principles presented in [2]. 

We shall show that source-optimized clustering algorithms can be a key enabler towards the goal of obtaining 
both a scalable encoding and decoding solution feasible for large-scale sensor networks. 

'it is worth pointing out that decoding according to {5J has only to be performed, in principle, only once for each realization w G W and 
that the calculated estimate could be then stored in the form of a decoding table for n = 1, 2, . . . , jV. Thus, the decoding operation would 
reduce to a mere table look-up, i.e. decoding of JV sources would be of computational complexity O(N). However, such a decoding table itself 
has a space complexity of 0(NK N ) and the computational complexity for creating it would be NK N times the complexity of decoding a 
single source, i.e. it would be of 0(NK N F ). Therefore, the concept of using a decoding table shall be discarded throughout this work. 
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III. Index Assignment Design 

The distributed source coding concept followed throughout this work is characterized by the fact that it can be 
represented by a simple index assignment stage, i.e. by a one-to-one mapping from the quantization indices i n G 2 n 
to the codewords w n G W„ such that w n = m n {in) for all n G J\f . Considering this low-complexity approach, 
distributed compression can be achieved by choosing \W n \ < \In\, i-e. whenever we have fewer codewords than 
quantization levels^] Thus, the data rate can be reduced from R' n = [log 2 |X„|] to R n = [log 2 |W n |] [bits/sample] 
for several encoders n G TV. The goal is to jointly design such index assignments such that the end-to-end distortion 
for an arbitrarily chosen subset of sources *]/ C J\f is minimized. The design procedure presented in the 
following was inspired by [10] where a coding solution for two correlated observations was presented. In this work, 
we generalize the corresponding design algorithm to construct distortion optimized index assignments for an arbitrary 
subset of encoders Q C J\f. It is worth mentioning that, in principle, several other methods can be used to construct 
suitable index assignments, e.g. those based on syndromes [19], on Diophantine index assignments [16] or even on 
random index assignments. However, the presented method has the advantage that it is very versatile and that the 
distortion itself serves as an optimization criterion within the design. 

A. Optimization Criterion 

The figure of merit for our optimization procedure is the minimization of the end-to-end distortion for an 

arbitrarily chosen subset of sources $ C # which, for the case of a MSE distortion metric, can be expressed as 
follows: 

d(#) = £{||U* -U*|| 2 } = £{(U* -U*) T • (U* - U*)} 

= E{ ^2 0n - U n ) 2 } = ^2 E iftn - U n ) 2 } (6) 

Vug* VnS* 

where denotes the vector of considered source variables U n and, equally, denotes the vector of estimates 
U n considered within the calculation, n 6 $. In Appendix IIVI it is shown that the distortion associated with each 
source d(n) = E{(U n — U n ) 2 }, n 6 <]/, can be expressed as follows 

d(n) =E{(U„ -U n f} + E{(U n -U n ) 2 } (7) 

where the reconstruction levels of the quantizers are assumed to be the centroid of the quantization cells such that 
u n ,i n = E{Un\in} for all i n G I n . We see that the distortion d(n) consists of two components where d q (n) = 
E{(U n — U n ) 2 } is the component directly resulting from the finite granularity of the scalar quantizer q n , n 6 $, 
and 

d d (n) =E{(U n - Un) 2 } = P(i*)-(wn(w T )-Wn,i„) 2 (8) 

is the component mainly affected by the choice of the estimate i„(wr) for the vector of available codewords 
wt G Wr, Wr = IlvieT^' T — ^ e l al;ter depends directly on the configuration of the index assignments 
mi of all encoders I G T; compare e.g. ® for the case where CME is considered for decoding. For the design of 
the index assignments, as presented in the following, we set T equal to f2, i.e. the codewords from all encoders Q, 

2 Such index assignments generally increase the distortion of the system, because information is lost during the mapping process, i.e. more 
than one quantization index i n might lead to one and the same codeword w n . However, since the rate can be reduced considerably, this method 
offers a way to achieve a wider range of rate/distortion trade-offs. 
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are assumed to be known. Based on {7J, we can state that the distortion of all sources n £ $ can be expressed by 
the sum 

d(¥) = d W = (d q (n) + d d (n)) = £ d q (n) + d ^ n ) ^ 

Vng* Vng* Vng* Vng* 

where dq^) = X^vng* d q (n) is caused by the quantization stage and dd(^) = X^vng* dd(n) is caused by the index 
assignment stage. It is worth pointing out that the calculation of d q (^) does not take into account any knowledge 
about the actual configuration of the index assignments which is very helpful for design purposes, as presented next. 



B. Index-Reuse Algorithm 

The basic idea underlying the presented algorithm is to construct index assignments in an iterative fashion. In each 
step of this procedure, the number of output codewords is reduced such that, in general, more than one quantization 
index is assigned to each codeword index. This means that the codeword indices are reused, while considering the 
resulting end-to-end distortion as the optimization criterion. 

Starting with bijective mappings between the quantization indices i„ and the codewords w n , where the number of 
codewords is equal to the number of quantization indices, i.e. \T n \ = |WW|, the algorithm subsequently modifies the 
mapping functions m n for all considered encoders n £ Q by merging two codewords (or, equivalently, the originating 
quantization indices) to a single new codeword. This is repeated until the targeted number of codewords, denoted as 
K, is reached. In each step of the procedure, the algorithm chooses the merging from all possible candidates yielding 
the minimum distortion = dofvE') + drf(^) calculated for the set of considered sources ^ where only dd('I') is 
affected by the index assignments|j 

For a detailed discussion of the algorithm, we assume that |X„| = L and |W n | = L in the beginning of the 
procedure and that \W n \ = K, K < L, at the end of the procedure, for all n £ $1. For implementation purposes, 
we assume that W n = {0, 1, . .. , |VV»| — 1} and represent the mapping functions m n : 1 n — ► Wn by vectors 
fn £ w! 1 "' such that the codeword w n can be obtained from the index i„ by simple vector referencing where 
w n = fn(in) for all i n € T n and for all n £ Q,. The merging of two codewords w n = a and b within the vector 
f n shall be described by the merging function g : wi 1 " 1 x W n X Wn -> vi 1 " 1 where V„ € {0, 1, . .. , \W n \ - 2} 
is the resulting codeword alphabet with a reduced number of codewords and the vector e„, describing the resulting 
mapping, can be obtained from f n by e„ = g(f n , a, b). Assuming that at the initialization of the algorithm the vector 
f n was initialized such that f„ = (0, 1, . . . , \T n \ — 1) T and that a < b, then it is easy to show that e„ can be obtained 
from f„ by performing the following assignment for i n = 0, 1, . . . , \2 n \ — 1: 

a , for f„(i„) = a or f„(i„) = b 

3n(in) = { f n {in) - 1 ,fOtf n (i n )>b 

fn(*n) , otherwise. 

Let En = {fm :m£ n,m/n}U { e n} be the collection of mapping vectors after merging two codewords in f„. 
We use the notational convention that dd{^,£n) can be used to indicate that the distortion dd{9) according to ^ 
was calculated based on those mapping functions. A detailed formulation of the whole procedure can be found in 
Algorithm Q] 

Since this particular property was required in the problem setup of Section [TTJ it is worth pointing out that the 
presented algorithm constructs index assignments that are surjective functions. In the initial step of the algorithm the 
index assignments are assumed to be bijective functions which, by definition, are also surjective. In each further step 

We note that the search algorithm is not optimal due to the single-step nature of the optimization. 
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Algorithm 1: Index-Reuse Optimization Algorithm 



1 Initialization 

2 • start with one-to-one mapping 

3 f„ <- (0, 1,...,L - 1) T , for all n e 

4 • set initial number of codewords 

5 k <— L 

6 Main Loop 

7 while ffc > il'J do 
for (n e do 

• set reference distortion to maximum 

d* <— oo 
for fa = 0,1,..., fc-2J do 

for (b = a + l,a + 2,...,k - 1) do 

• merge cell a and cell b within f „ 
e„ = g(f n , a, b) 

• calculate resulting overall distortion 
d = d d {V,E n ) 

if (d < d*) then 

• save current mapping and distortion 
d* <- d 

• reduce number of codewords by one 

fe <- k- 1 



of the procedure two codewords in the original mapping are mapped (merged) onto a single new codeword. It is easy 
to see that this corresponds to the case where the indices that were mapped to either one of the original codewords 
are now mapped onto the newly created codeword. Thus, the assignment is still a function, since the involved indices 
are still mapped onto a codeword, and it is also surjective, since for the newly created codeword there always exist 
some indices that are mapped onto it. This is valid for each step of the procedure and, by induction, the mapping 
created after any number of steps is (still) a surjective function. 

For the important case where the set of considered sources is equal to the set of considered encoders, i.e. when 
= £7, the complexity of the optimization algorithm is discussed in detail in Appendix [V] It is shown that the 
algorithm can be implemented with a computational complexity that grows exponential with |fi| making it feasible 
only for a small number of encoders |fi|. A reasonable way to decrease the overall complexity for a large number 
of encoders is to form clusters of encoders and optimize each cluster separately, as explained in the next section. 
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IV. Source-Optimized Clustering 

The need for a computationally feasible code design motivates us to partition the entire set of encoders into subsets 
(clusters). The encoders can then be optimized within each cluster, thus, reducing the optimization effort for the 
encoding side. Moreover, this clustering and coding strategy can be easily combined with the scalable decoder 
presented in [2] which relies on a carefully chosen factor graph model and allows for joint decoding of the data sent 
by all encoders. The key towards computationally feasible joint decoding is for the decoder to use an approximated 
PDF p(u) instead of p(u) as basis for efficient decoding considering only the statistical dependencies within certain 
subsets of sources. Therefore, it becomes crucial to build the decoding model and the source clusters alongside to 
ensure that statistical dependencies, which are exploited during encoding to reduce redundancy within the clusters, 
are still available at the decoder to compensate for the information loss imposed by the index assignment stage. In 
[2] the Kullback-Leibler distance (KLD) was deemed to be a suitable measure to estimate the impact of the chosen 
decoding model onto the overall system performance, i.e. the MSE distortion. Since we are interested in minimizing 
the overall system MSE, we chose the KLD as optimization criterion to find not only a suitable source approximation 
but also adequate clusters. 

A. Preliminaries 

The PDF p(u) can be approximated by assuming a factorization of the form p(u) = IIm=i /«i( u s m ) where S m C Af 
for 77i = 1,2, ...,M are subsets of source indices such that Um=i ^ m = Since generally p(u) 7^ p(u), the 
resulting PDF p(u) is an approximation of p(u). 

Specifically, we shall consider constrained chain rule expansions (CCREs) of p(u) that can be obtained from the 
regular chain rule expansion by removing some of the conditioning variables. More formally, a factorization 

M M 

p(u) = n /™( u s.j = n p(™A m \u Bm ), do) 

m— 1 m — 1 

where Am, B m and S m = An U/3 m are subsets of the elements in TV, is a CCRE of p(u), if the following constraints 
are met for m = 1,2,..., M: 

M m-1 

An n B m = 0, (J An = M, B m c (J A. (11) 

m = l 1 = 1 

Notice that the set Bi is always empty and that B m = UI^i 1 holds for the usual chain rule expansion. We call 
a CCRE symmetric if any B m with m = 2, 3, • • ■ , M is a subset of Si for some I < m. 

The Kullback-Leibler distance (KLD) between a PDF p(u) and its approximation p(u) is defined as 

D(p(u)||p(u)) = /■■■/ P(u) log 2 gj^j du, (12) 

e.g. see [7], which can be used as optimization criterion when constructing source factorizations. In [2] it was shown 
that the KLD can be calculated explicitly for CCREs of Gaussian PDFs A/"(0jv, R.) as follows 

1 M 

£(p(u)||p(u)) = --log 2 |R| + ^2 AD(5 ra ,/3 m ) (13) 

m=l 

where the KLD benefit obtained by introducing the factor p(u^ m |us m ) is given by 

AD(5 m ,B m ) = ilog 2 ^J ( 14) 

where Rs m as well as Rs m are the covariance matrices of the Gaussian PDFs p(us m ) and p(ug m ), respectively. 

It is worth pointing out that a source factorization according to J X Oi l can be used directly for an efficient decoder 
implementation as discussed in Appendix I VII In particular, this holds for the case where the number of variables in 
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the factors is bounded such that |<S m j < S for m = 1, 2, . . . , M. A complexity analysis for scalable decoding based 
these assumptions can be found in Appendix I VII It is shown that the computational complexity for the case where 
\B m | = 1 farm = 1,2,..., M is of 0(MSF s ). In the other cases the computational complexity is of 0(TMSF ) 
where T > 1 specifies the maximum number of iterations used for decoding. Notice that M, the number of factors in 
the factorization, is considered as a parameter here. However, it shall be shown later in this work that M < 27V + 1 
holds. 

B. Clustering Algorithm 

The clustering algorithm described in the following is based on the principles of hierarchical clustering [12] and can 
be seen as a variant of the Ward algorithm [27], The goal is to cluster the set of sources N into subsets A c C M 
such that Uvcer A c = A/ and A* nAj = for all i =fc j with {i, j} e F where T = {1, 2, . .. , C} is the set of cluster 
indices and C = |T| is the number of clusters. The maximum cluster size S is assumed to be given and defined such 
that |A C | < S for all c G T. 

The clusters itself are constructed by a successive merging process. The algorithm starts with a set of single- 
element clusters such that A' s = {s} for all s 6 T' where V' = Af — {1, 2, . . . , TV} is the initial set of cluster indices. 
In each of the following steps two of those clusters are selected and merged into a new cluster. The clusters are selected 
using the KLD D(p(u)||p(u)) between the original PDF p(u) and the approximated PDF p(u) = ]~[vser' p( u a' s ) 
as an objective function where p(u) directly results from the current choice of clusters and D(p(u)||p(u)) is defined 
analog to d!2t . For each possible pair of clusters (A' fe , AJ) with k I and {k, 1} £ T', the algorithm determines the 
current value of the objective function to find the pair (A' fc ,Aj) leading to the smallest KLD between original and 
approximated PDF. The indices of the selected clusters (k, I) are then removed from the current set of cluster indices 
r' whereas the index of the newly created cluster r is added to it. This procedure is repeated until only a single 
cluster remains and a history of all mergings performed during the different stages of the optimization procedure is 
obtained. 

Using d!3t , it is possible to show that the overall KLD can be calculated as follows 

D(p(u)||p(u)) = -ilog 2 |R|+ A -°( A -0)> d5) 

Vser' 

where AD(Ag,0) is the KLD benefit imposed by an arbitrary cluster A^. Since the objective function has to be 
evaluated many times during the optimization process, it is useful to express Q3} in terms of intermediate results 
to reduce computational complexity. The differential KLD benefit created by merging an arbitrary pair of clusters 
(AJ., A() with k ^ I and {k, 1} 6 T' into a new cluster can be expressed as follows 

AD'(A' fe ,A;) = A7J(A' fc U A;,0) - A£>(A' fe ,0) - AD(AU), (16) 

which can be used to locally evaluate the impact of the considered merging onto the overall KLD given by {B). 
Assuming that t is the number of mergings performed at a certain stage of the procedure, then the expression 

D(p(u)||p(u)) = -- log 2 |R| + AD'(A' fc(3) , A; {3) ) 

s = l 

can be used to evaluate the overall KLD in J 1 5 b based on the differential KLD benefits in | |16I > only. 

A detailed description of the entire procedure can be found in Algorithm[2]where r labels the clusters in ascending 
order and h (a two-dimensional array) is used to store a history of the mergings performed during different stages 
of the clustering procedure. In Figure (3]a) the merging process is illustrated for an exemplary scenario. A graphical 
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Algorithm 2: KLD optimized clustering 



1 Initialization 

2 • start with one-element clusters 

3 r^{i,...,7v} 

4 A; <- {s}, for all s E V 

5 t<- 1, r <- N + t 

6 Main Loop 

7 repeat 

8 • find the pair of cluster (A' k , A[) with k ^ ? and {k, Z} e T' 

9 such that \AD'(A' k , AJ)| is maximized 
to • store intermediate results: 

a;-a',ua; 

12 • delete original clusters from index list: 

13 r' <-T'\{k,l} 

14 • add new cluster to index list: 

is r <- r u {r} 

16 • save clustering history: 

17 h(t, 1) <- fc, ft(i,2) <- / 

18 • update internal variables: 

19 i <-*+ 1, r <- iV + * 

20 until f|T'| = 1J 



representation of the mergings performed during different stages of the optimization, the so-called dendrogram [12], 
is shown in Figure [3jb). 

Using the dendrogram derived before, the source clusters A c with a maximum cluster size of S can be constructed. 
We start at the root of the dendrogram, which is basically a tree, and descend along its branches to lower hierarchical 
levels. While moving from one level to the next lower one, the dendrogram branches into two subtrees. The number 
of leafs, i.e. the number of sources connected to each subtree are counted and if the number of leafs of one (or both) 
subtree(s) is smaller or equal to 5, we cut the corresponding subtree out of the dendrogram. This pruning process 
is repeated until all leafs are removed from the dendrogram. When the pruning is finished, the subtrees are labeled 
by the successively increased index c — 1,2, ... ,C. The source clusters A c , c £ T, can then be determined from the 
subtrees by assigning the variables n £ N (associated with each of the subtree's leafs) to the corresponding cluster. 
The overall KLD D(p(u)\\p(u)) between the original PDF p(u) and the approximated PDF p(u) — ]~IvcerP( UA c) 
can then be calculated based on the resulting clusters 

D(p(u)|]p(u)) = ~ log 2 |R| + Y, AD ( A o 0) ( 17 > 

Veer 

where D(p(u)||p(u)) is defined analog to d 1 2b . In Figure [51b) the pruning process is illustrated for the previous 
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Fig. 3. Example. Source-optimized clustering procedure with TV = 9 uniformly distributed sensors picking-up observations ui,..., Ug: (a) 
Mergings performed during the hierarchical clustering procedure for the first four iteration steps leading to resulting clusters A' r with indices 
r = 10, ■ ■ ■ , 13. (b) Tree representation of the mergings performed during different the stages of the optimization process (dendrogram). The 
KLD benefit |AD(AJ, ; 0)| in [bit] imposed by the clusters h! r is provided for all iteration steps. The branches of the tree that are cut during 
the pruning process with a maximum cluster size of S = 4 are marked with a cross. After pruning, the source clusters Ai = {3}, A2 = {8}, 
A 3 = {2}, A 4 = {1, 6} and A 5 = {4, 5, 7, 9} can be defined. 



example. 

Because of the hierarchical merging concept based on local decisions, the proposed clustering algorithm is in 
general sub-optimal. However, the hierarchical approach has the advantage that the resulting dendrogram can be used 
elegantly to construct clusters with a bounded number of source variables 

In Appendix I VIII it is shown that the computational complexity of source-optimized clustering (evaluated in a 
very pessimistic fashion) is of 0(N 5 log N) which makes the overall procedure feasible for medium to large values 
of N. Furthermore, it is easy to show that the number of clusters C with a maximum cluster size of S is bounded 
according to C < [-^J which shall be required in the next section. 

C. Source-Optimized Factorization 

In the last section we have shown how to construct KLD optimized clusters fitting our purposes. The second step 
towards our goal of obtaining a source factorization is to transduce the derived clusters into a symmetric CCRE of 
the form p(u) = IIm=i f m i Us m) matching the conditions in Jilt . This can be achieved by linking the clusters A c , 
c 6 r, successively together. 

The basic principle of the linking procedure is as follows: After choosing a specific cluster as starting point for the 
procedure, select one of the unconnected clusters (i.e. a cluster which is not yet considered in the source factorization) 
and link it with the already connected clusters (i.e. incorporate it into the source factorization). Assuming that cluster 
r € r was chosen as the starting point for the optimization, we can define a set of linked clusters F' = {r} and a 
set of unconnected clusters V = r\{r}. At each step of the procedure a cluster k £ V and a cluster I € P are 

4 With partitional clustering methods, see e.g. [12], this would be an arguably difficult task. 
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selected. The index / is added to the set of linked clusters, i.e. r' = V U {I}, and removed from the set unconnected 
clusters, i.e. P = P\{Z}. This is repeated until all clusters are linked, i.e. |r'| = |T|. 

More specifically, two clusters {k, /} 6 f are linked by choosing a set of variables Vk ^ A^ and a set of variables 
Qi C Aj. These sets will form the basis of the factor introduced into the source factorization. Since the complexity of 
scalable decoding is highly dependent on the number of variables within the single factors of the underlying source 
factorization (see Appendix IVII or [2] for details), we introduce the design parameters A and B such that \Vk\ < A 
and | Qi | < B for all {k,l} e T, 

The source factorization starts with a single factor p(ux r ) containing the variables of the initially chosen cluster r, 
i.e. T r — A r . While establishing a link between the two clusters k and I, the factors p(uq [ \u-p k ) and p(uj l |uq, ) are 
added to the source factorization where I; = A;\Q;. As the clusters are linked, a source factorization is constructed 
step-by-step where the running index d is used to index the added clusters. The resulting source factorization can 
then be written as 

c 

p(u) =p(u Ai(d=1) )n ( p(uQ i(d) |uy fc( d-i))f( u ^ w l u a iW ) ) ( 18 > 

(a) (b) (c) 

where A;( d=1 ) = A r = Tud=i) = 1 r and all factors with Ii(d) = for d = 2, ■ ■ ■ , C are discarded. Notice 
that, when constructed according to the aforementioned linking procedure, there exists a one-to-one correspondence 
between the running index d and the cluster indices c £ V. Moreover, it can easily be shown that d 1 8 fc fullfills the 
criteria of CCREs by verifying the conditions in Jilt . 

After discussing how source factorizations fitting our purposes can be constructed, we are ready to show how 
to choose the subsets Vk and Qi and in which order the clusters should be linked such that the overall KLD 
D(p(u) | |p(u)) defined analog to d 1 2b is minimized. 

It is easy to show that the KLD of the source factorization given in d 1 8b can be expressed as 

D(p(u)||p(u)) = -ilog 2 |R|+AD(A !(d=1) ,0) 

a 

+ J2 (^D{T Hd -i) U Qi W ,Vk {d -i)) + AD(A l{d) , Q i(d) )). (19) 

d=2 

The KLD benefit imposed by the factors (c) in d 1 8b can be written 

AD(A(, Qi) = i log 2 = i log 2 |R A , | - i log 2 |R S , 

= AD(A ; , 0) - AD(Qi, 0) (20) 

since the covariance matrices Ra, and Rg, are symmetric and positive-semidefinite and, thus, the determinants 
|Ra, I and |Rq, | are non-negative. Similarly, the KLD benefit imposed by the factors (b) in J 1 8b can be expressed as 

AD(V k U Qi, Vk) = AD(V k U Qi, 0) - AD{T k , 0). (21) 

Considering the KLD benefit in J20b and l !21b . we notice that AD(A;,0) in ( |20b already was considered during the 
cluster optimization in Section ITVl . Thus, we are able to define the KLD benefit of establishing a link based on the 
sets Vk and Qi as 

AD*(Vk, Qi) = AD(Vk U Qi, 0) - AD(V k , 0) - AD{Q h 0). (22) 

Using 07}, the KLD of the source factorization in d 1 8 b can be written as 

a 

D(p(u)\\p(u)) = D(p(u)||p(u)) + AD\T k(i -i), Q m ), (23) 

d=2 

decoupling the link optimization from the cluster optimization. 
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If an already linked cluster k is to be connected to a cluster / in a KLD optimal way, then the sets Vk C At and 
Qi C Ai have to be chosen such that the KLD benefit AD*(Vk, Qi) according to i22\ is maximized in magnitude. 
The set of all possible subsets V' k C Ak with \V' k \ = A is denoted as T(A,A k ) and the set of all possible subsets 
Q'l ^ Ai with \ Q[\ = £? is denoted as T(B,Ai). Vk and Q; are therefore denned as 

(V k ,Ql) = argmin { AD* (V' k , Q[)\ (24) 

Q',eT(B,Al) 

and we define link cost as 

c fe ,i = AD*{V k ,Qi). (25) 

Notice that generally Ck,i 7^ cj^. 

To determine how the clusters are to be linked (i.e. which clusters are to be linked and in which direction), a 
graph can be constructed representing the KLD optimal links between the clusters. The vertices of the graph are 
obtained by contracting each cluster A c , with c € F, to a single vertex v c and defining the set of vertices as 

V = {v c : c € T}. 

The set of all possible directed edges ek,i = (vk, vi) between the vertices v k and vi, {k, 1} £ T, is defined as 

£ = {e k .i = (v k ,vi) : {k,l} €T,k^ I}, 

where the cost Ck,i of each edge e^,; in terms of KLD benefit is given by l |25t . A fully connected graph Q — (V, £ ) 
is thus obtained. Provided that the clusters are considered fixed, the overall KLD of the source factorization can 
be optimized solely by optimizing the cluster links, please refer to J23l >. which are in turn represented by the directed 
edges in Q. The optimization problem therefore reduces to the Minimum (cost) Directed Spanning Tree (MDST) 
problem for which first algorithms were found by Chu and Liu [6] as well as by Edmonds [9] to be generalized later 
by Georgiadis [11]. After applying one of these algorithms to the fully connected graph Q, the MDST Q' = (V, £') 
with £' C £ and its root vertex (i.e. the vertex v r G V which only has outgoing edges) can be found. The source 
factorization i I St can then be constructed by moving along the edges of the obtained tree Q' (possibly inspired by a 
Depth-First Search, see e.g. [1, p. 484]) and linking the clusters corresponding to the visited vertices together. More 
specifically, the root vertex of Q' corresponds to the factor denoted as (a) in d!8t , the visited edges correspond to 
the factors denoted as (b) and the visited vertices correspond to the factors denoted as (c). Notice that this tree-based 
linking approach also conforms with the aforementioned linking procedure, which requires that links result only from 
already connected clusters, and thus guarantees a valid CCRE. 

Considering the previous example with clusters Ai = {3}, A2 = {8}, A3 = {2}, A4 = {1,6}, Ag = 
{4,5,7,9} and A = B = 2, we get the MDST Q' = (V, £') with V = {vi, ■ ■ ■ , v 9 }, root v B and £' = 
{(115, V4), (1)4,^2), (1)4,^3), (u4,wi)}. Figure |4] shows the corresponding source factorization. 

Appendix I Villi discusses the complexity of the source-optimized linking procedure and shows that the compu- 
tational complexity grows exponentially with S assuming that A = B = |-, which makes the overall procedure 
feasible for small cluster sizes S. Notice that in the last section it was shown that C < [-^J allowing us to represent 
the complexity only based on the system parameters N and S. It is easy to see that, because the linking procedure 
basically constructs a tree between the clusters, the number of factors M in the factorization l |10t can be bounded 
according to M < 2N + 1. This also means that the number of factors with a maximum size of S is at most 2N + 1, 
as used in Section HV-AI to analyze the complexity of the scalable decoder. 
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Fig. 4. Example. KLD optimized source factorization obtained by linking clusters with A = B = 2. The factor graph represents the symmetric 
CCRE p(u) = p(u4,u 5 ,u7,uq) ■ p(ui, ti6 1 «4, Kg) ■ p(«2|wi , "e) • p(us|«i, ue) ■ p(u3\ui,ue). 

V. Results and Discussion 

To underline the effectiveness and efficiency of our low-complexity coding and clustering strategies, we present 
numerical performance results for two scenarios with randomly placed sensors and two instances of the so-called 
CEO Problem [26]. 

A. Randomly Placed Sensors 

We consider a unit square with TV = 100 uniformly distributed sensors. The sensor measurements u n are Gaus- 
sian distributed according to 7V(0, 1). As outlined in Section III-AI we assume that sensor measurements u = 
(iti, tt2, . . . , un) T are distributed according to a multivariate Gaussian distribution A/"(0jv, Ft), where the correlation 
between a pair of sensors Uk and ui decreases exponentially with the distance dk.i between them, such that 
pk.i = exp(— j3 ■ dk,i). Since the performance of our techniques depend on the correlations between the sensors, 
we consider two different scenarios, one with (3 — 0.5 (strongly correlated sensor measurements) and one with /3 — 2 
(weakly correlated measurements). All scalar quantizers at the encoders are Lloyd-Max optimized to minimize the 
MSE in the sensor readings u n using identical resolution for quantization and identical rates for data transmission, 
i.e. \T n \ = L and R n = R for all n £ M, M = {1, 2, . . . , TV}, where L < 16 was chosen. The clusters A c C J\f 
indexed by c 6 T are derived as described in Section IIV-BI where a maximum cluster size of S — 4 was chosen, 
see Figure [5] The index assignments are then designed successively for all clusters with |A C | > 1 and c £ T by 
employing the IR algorithm described in Section [III] with * = Q — A c . Since it is not possible to construct index 
assignments for single-element clusters, we chose in this case a scalar quantizer (Lloyd-Max optimized as before) 
with decreased resolution and no index assignments such that R n — R is still guaranteed for all encoders n G N ■ 
The source factorization used for decoding is constructed as described in Section HV-CI assuming that A = B = 1, 
see Figure [5] The decoder is based on the sum-product algorithm as described in [2] where the required PMFs were 
obtained by Monte Carlo simulation using Lloyd-Max optimized quantizers with resolution L n = L for all n G Af. 
To evaluate the performance of the coding strategies, we measure the output signal-to-noise ratio (SNR) given by 




averaged over a A^x 10000 source samples. The simulation results of our system are depicted in Table|I]for strongly and 
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Fig. 5. Simulation scenario. Graphical representation of the KLD optimized source factorization for N = 100 uniformly distributed sensors with 
correlation factor /3 = 0.5. The clusters with a maximum size of 5 = 4 (indicated by circles) were created using the hierarchical clustering 
method and linked together by choosing A = B = 1. 



TABLE I 

Simulation results for N = 100 and (3 = {0.5, 2} 



R [bit] 


1 


P = 
2 


0.5 

3 


4 


1 


(3 

2 


= 2 

3 


4 


SNR Dcc [dB] 


4.44 


9.46 


14.61 


20.32 


4.45 


9.32 


14.65 


20.27 


SNR 1R [dB] 


11.07 


14.86 


18.29 


N.A. 


7.54 


11.72 


16.21 


N.A. 



weakly correlated sources. In both scenarios, we consider the performance achieved when using scalar quantization 
alone at the encoder, i.e. where the performance is mainly governed by the properties of the decoder (Dec), and 
the performance achieved when scalar quantization with a subsequent index-reuse (IR) is used for encoding. Table 
entries labeled as N.A. (not available) indicate that those instances could not be considered here due to their high 
computational demand|f| Notice that only the index assignments yielding best possible performance were chosen for 
the experiments (e.g. a rate of R = 1 [bits/sample] may be obtained from quantizers of resolution L = 4,8, 16). 

Our simulation results reveal that simple index assignment techniques applied to local clusters can achieve 
significant performance gains using our coding approach, especially for low data rates and strongly correlated sources. 

B. The CEO Problem 

In the following, we show the applicability of our techniques to another relevant sensor network model: the quadratic 
Gaussian CEO Problem [3]. Let uo be the output of a continuous-valued Gaussian source Uo. For all n 6 TV, 
M — {1, 2, ... , N}, let u„ denote noisy observations of uo which are corrupted by additive noise, i.e. u n = uo + n n . 
The noise samples are generated by Gaussian noise processes N n statistically independent over n. The observations 
u n are encoded and transmitted by independently operating encoders indexed by n. The main task of the CEO is to 
estimate no based on the data obtained from the encoders. In [15] we derived the optimal decoding rule exploiting 

5 This instance would require a high-rate quantizer with a resolution larger than L = 16. 
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TABLE II 

Simulation results for the CEO scenario with N = 100 encoders, source variance 07] =1 and noise- variances A 2 ={0. 1 , 

0.5}. 



R [bit] 


1 


2 


A 2 =0.1 

3 


4 


1 


2 


A 2 =0.5 

3 


4 


SNRR/ D [dB] 


28.66 


29.70 


29.93 


29.99 


21.72 


22.74 


22.96 


23.01 


SNR DcL .[dB] 


9.67 


19.37 


26.21 


28.48 


15.25 


20.84 


22.49 


22.74 


SNRi R [dB] 


22.71 


26.70 


28.21 


N.A. 


18.76 


21.56 


N.B. 


N.A. 



the special properties of this problem setup and studied a feasible decoder using a source approximation based on 
the factorization 

JV 

p(uo, Mi, Un) — p(uo) Y\ p(Un\uo) 
n=l 

which can be easily represented by a factor graph [14]. In the following, we consider a scenario of N — 100 encoders. 
The source process is Gaussian distributed A/"(so,co) with mean so = and variance a 2 = 1. The noise processes 
are Gaussian distributed TV(7n,A 2 ) with mean l n — and variance A 2 = A for all n £ TV where A = {0.1,0.5} 
was chosen depending on the considered scenario. All scalar quantizers at the encoders are Lloyd-Max optimized to 
minimize the MSE in the sensor readings u n using identical resolution for quantization and identical rates for data 
transmission, i.e. \I n \ = L and R n = R for all n 6 TV where L < 16 was chosen. We use the scalable decoder 
as described in Section [II] where the required PMFs were determined using Monte Carlo simulation with resolution 
\To I = 64 for the source uo and \T n \ — L for the observations u n for all n 6 TV. Notice that in case of our highly 
symmetric scenario, with \X n \ = L, A^ = A 2 and l n = 0, the probabilities p(i n |*o) can be considered identical for 
all n 6 TV. Therefore, the index assignments need to be designed only once for a single, arbitrarily chosen cluster 
A C TV with |A| = S where S — 4 was chosen. After employing the IR algorithm described in Section 11111 with 
Q — A and ^ = {0}, the resulting index assignments can be assigned repeatedly to all clusters within the system. 
To evaluate the performance of our coding strategies, we measure the output SNR for Uo given by 

Output SNR = 10 ■ log in (- U ° ^ in dB 

\{u - uo) 1 J 

versus the (symmetric) encoder transmission rate averaged over (N + 1) x 10000 source samples and compare it with 
the (sum) rate-distortion function, offered by [5], which presents an upper bound found to be tight for noise processes 
with identical variance. In Table [XT] we present some results to underline the effectiveness of our approach. The 
performance of the system without index assignments (Dec) and the performance obtained by using index-reuse (IR) 
is compared to the theoretically possible value as given by the (sum) rate-distortion function (R/D) according to [5]. 
Table entries labeled as N.A. (not available) indicate that those instance could not be considered here due to their 
high computational demand|*| Table entries labeled as N.B. (no benefit) indicate that in this case index-reuse does 
not outperform standard quantization. Notice that only the index assignments yielding the best possible performance 
were chosen for the experiments. 

The numerical results reveal that our index-reuse approach leads in many cases to significant performance 
improvements over standard quantization. It might happen, however, that our index assignments are not able to 
outperform scalar quantization. Whether or not this is true depends on several factors: (a) the quantizer resolution 

6 This instance would require a high-rate quantizer with a resolution larger than L = 16. 
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L, (b) the number of output bits R and (c) the correlation properties of the sources determined by Cg and A. In 
cases where the sources are weakly correlated, e.g. for large values of A, it becomes harder (or even impossible) to 
find index assignments that offer good rate/distortion trade-offs. In particular this might be true in our case due to 
the simplicity of the considered coding concept and the sub-optimality of the proposed index-reuse algorithm whose 
performance is highly dependent on the choice of L and R. 

VI. Concluding Remarks 

We presented a scalable solution for distributed source coding in large-scale sensor networks. Our methods rely on 
the combination of a simple encoding stage (a scalar quantizer and an index assignment stage) and a source-optimized 
clustering algorithm. Despite the simplicity of the proposed techniques, our results show significant performance gains 
in comparison with standard scalar quantization. It is worth mentioning that the same ideas can be used together with 
other distributed source coding schemes, e.g. those based on syndromes [19], on Diophantine index assignments [16] 
or even on random index assignments. As part of our ongoing work we are considering the case in which the 
covariance matrix of the sensor observations is not known beforehand. Thus, each sensor must decide on-the-fly 
which code to use and inform the decoder. Finding distributed clustering and coding algorithms for this problem 
remains a challenging task. 

Appendix I 
Optimal Decoding Rule 

In this section, we want to derive a simple expression for the optimal decoding rule. 

Let k € TV be the index identifying the source for which the estimate has to be calculated. Let T C J\f be a set 
of indices identifying the encoders whose codewords, collected in the vector wr £ Wt = TlvneT Wn> are available 
for the calculation. 

Specifically, we want to show that _E{[/fc|wr} = X^i'o" 1 E{U k \i k } -p(ifc|wr)- We start by using the definition 
of the conditional expectation and apply the Bayes rule such that 

E{U k \-w T } =/ u k -p{u k \vf T ) du k = — -■/ u k ■ p(w T \u k ) ■ p(u k ) du k . (26) 

Ju k =-oo P(wr) Ju k = -oc 

Furthermore, we can state that 

1^*1-1 

p(w r \u k ) — 22 P( w T\i k ) ■ p(i k \u k ) = 

i fc =o I 0, otherwise. 

Since the index i k — q k (u k ) is constant for all u k that fall into the quantizer region B k (i k ) such that b k (i k ) < u k < 
b k (i k + 1), the integral in d26t can be splitted into separate parts and we obtain 

E{U k \w r } = —, r- > p(w r \i k ) u k -p(u k )du k . (27) 

P( w r) f^ Ju k =b k (i k ) 

We observe that p(wr|*fe) = p ^ k ^|jj.'j'^ Wr " 1 an d that 

u k -p(u k ) du k = u k -p(u k \i k ) du k = E{U k \i k }, (28) 

■"■k= b k('k) Ju k = -oc 




p(ik 

where the equality (a) holds since p(i k \u k ) is either zero or unity depending on the fact if q k (u k ) = i k or, identically, 
if u k falls into the quantizer region B k (i k ) such that b k (i k ) < u k < b k (i k + 1). Therefore, we can state that 

p(u k \j k ) - P^hlhl - PfaK) • P(uk) _ J wjj' if q k (u k ) = ik ^ 



p(ik) p(ik) 



0, otherwise. 
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Using these results together with {27}, the desired equality can be established easily. 

Appendix II 

Efficient Marginalization and its Complexity 

In this section, we want to characterize the complexity of the marginalization operation required at several points 
of this work, e.g. consider the calculation of the optimal estimate in I0 where the marginalization in © has to be 
performed using the argument in {5]l. 

For a general treatment of the problem, we shall employ the same definitions as provided in Appendix [I] Let 
furthermore 5 = {k} U T be a set of indices identifying the sources whose discrete representations, collected in the 
vector is £ Ts = IlvnGS-^"' are mv °l ve d within the calculation. Specifically, we shall consider the calculation of 
p(ik = £|wt) out of p(wr,is) through the following marginalization 

p(i k = 1\wt) = 7 • ^2 p(wT,is) (30) 

Vi s el s :i fc = ! 

with 7 = l/p(wr) and 

p(w r ,is) =p(is) -p(wr|is) = p(is) -p(wr|ir) = p(is) • ] \ p(w n \i n ), (31) 

Vner 

where equality (a) obviously holds for k £ T, since in this case S = T, and also for k £ T, since in this case i*. 
does not provide any information about wr due to the fact that ir is known and w„ = m n (i n ) for all n £ T such 
that p(wr|is) = p(wr|ir). 

In the most straightforward implementation of the marginalization in \30\ , the summation over p(wr,is) has 
to be performed over all possible realizations of is £ Is with it — I where the actual value of p(wr,i,s) can be 
calculated using the product representation in < |3 1 1 - It is worth pointing out that p(wx|ir) in \3l\ can become either 
zero or unity depending on the current configuration of the transition probabilities p(w n \in) for all n £ T. This can 
be used to restrict the number of index tuples is £ Ts that have to be considered throughout the marginalization 
in BOb . as shown in the following. 

For brevity, we shall restrict ourselves to the case where k £ T, i.e. where S = tQ Let Qr(wr) be the set 
of index tuples ir £ It that are mapped onto wr £ WrB Then, the marginalization in d30t can be expressed as 
follows: 

p(i fc =/|w T ) = 7' p (' T ) 

Vi r £2l(w T ):i t = l 



= < 



7- P(>^)> if m k (i k = I) = w k 

vi T 6{i t =i}x2 n{li) (w TUt) ) (32) 

0, otherwise. 



Notice that the marginalization according to d32t has to be performed, if it has to be performed at all, only over the 
members of the set Sr\{fc} ( w r\{fc})- Since the cardinality of this set is much smaller than the cardinality of It 
in J30b the complexity of the marginalization can be reduced considerably. 

For a more detailed discussion of the complexity, the cardinality of the set Qr(wr) shall be characterized in 
the following. Notice that Q„(w n ) denotes the set of indices i n £ T n that are mapped onto the codeword w n £ W n - 
The following result is usefull 

7 The results for the case k £T can be derived accordingly. 

8 It is worth pointing out that St( w t) can he constructed easily since the mapping functions m n are assumed to be known for all n £ T. 
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Lemma 2.1: For any surjective mapping function m n : X n — > W„ and any w n £ W n , \Qn(w n )\ < \T n \ — 

|W»| + 1. 

Proof: Since m n is a function each i„ £ X n is mapped to exactly one w n £ Wn- From this we conclude 
that (a) there are no i„ £ T n that are mapped to more than one w n £ W n =>■ flviu ew Qn(*°n) — (mutual 
exclusivity) =>■ | Uv™„ew„ 2n(u» n )| = Z^vw ra 6W n and ( D ) eacn *« G ^» is mapped to some w„ £ W n => 

I Uvu; gw 2n( w 'i)l = l^n|- Since m n is a surjective function there exists an i n £ T n for any w n £ W n such 
that to„ = rn n (i n ) and we conclude that (c) \Q n (w n )\ > 1 for all w n £ W n . From (a) and (b) we obtain that 
5Zv™ gw |2™( u '™)l = \1-n\ which can be solved for an arbitrarily chosen w„ £ W„, e.g. w n = a, and we obtain 
\Qn{w n = a) | = |T n | - I]vi U „ew„:»„^a |Qn(tWn)|. Because of (c) we know that \Q n (w n = o)| is maximal if 
|<2n(?i>n)| = 1 for all w n £ W n :Wn=£ a and we obtain that |Q n (w n = a) < |2T re | — (|W»| — 1) for any a £ W n 
establishing the claim. ■ 
Using this result, the complexity of the marginalization in J30l > can be characterized. For the sake of a simple 
discussion, we assume that \T n \ = L for all n £ T and that \W n \ — K for all n £ T. Using Lemma |2T| and after 
defining the system specific parameter F = L — K + 1, we are able to conclude that in our case |Q n (w n )| < F 
for any n £ T . We furthermore assume that the elementary operations (like additions, multiplications, comparisons, 
look-ups, etc.) are of constant complexity, i.e. of 0(1). Specifically, we assume that p(ir) can be determined with 
a complexity of O(l), e.g. that it can be approximated, simulated, etc. with constant complexity or that it can be 
looked-up. 

In the following, we consider the newly derived expression for the marginalization J32b . In the case where 
rrik(ik = J) = Wk holds, it is easy to see that, in the worst-case, F' r ' _1 instances of p(i-r) are required throughout 
the calculation and that around F' r ' _1 additions have to be performed. Thus, around 2-F' r ' _1 elementary operations 
have to be performed corresponding to a computational complexity of 0{F^ T ^~ 1 ). In the case where rrik{ik — I) = Wk 
does not hold, the result of the marginalization becomes zero, without any further calculations, and the computational 
complexity derives to be of 0(1) for testing the case alone. 

Appendix III 
Complexity of Optimal Decoding 

In this section, which uses the same definitions as the previous appendices, we discuss the complexity of optimal 
decoding as required e.g. in {3]l- Specifically, we want to consider the calculation 

|2*|-1 

Wfc(wr) = ^2 u k ,i k ■ p(ifc|wr). (33) 

i k =0 

We observe that the calculation in l |33l > requires that Uk,i k an d p(«fc|wr) have to be determined, multiplied and 
summed-up for all possible realizations of ik £ Xfc where p(ifc|wr) can be derived from p(is) using the efficient 
marginalization described in Appendix [TT1 

In order to use the results derived in Appendix [TTJ we restrict ourselves to the case where k £ T, i.e. where 
S = T^For a simplified complexity analysis, we furthermore assume that \T„\ — L, |W«| = K and \ Q n (w n )\ < F 
for all n £ T where F = L — K + 1. Elementary operations (like additions, multiplications, comparisons, look-ups, 
etc.) are assumed to be of constant complexity, i.e. of 0(1). Specifically, we assume that Uk,i k and p(ir) can be 
determined with a complexity of 0(1), e.g. that they can be approximated, simulated, etc. with constant complexity 
or that they can be looked-up. 

'The results for the case k £ T can be derived accordingly. 
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Using the results of Appendix [II] we are able to state that the computational complexity for deriving p(ifc|wr), 
as required in ( 133b . is of 0(F , ' T ' _1 ) or of 0(1) depending on the fact whether ik is mapped onto u>k or not. 
In order to determine the overall complexity, we notice that the summation in d33t has to be performed over all 
ik G Ik- Therefore, we can employ Lemma |2~T1 in Appendix ITT1 to determine how often it will be true (at most) that 
rrik{ik) = Wk and, thus, how often (at most) the calculation of p(«&|wt) in © has to be performed. We conclude 
that this calculation has to be performed (at most) F times. The test if rrik(ik) = Wk is true, the look-up of Uk,i k as 
well as the multiplication in {33} can be neglected compared to the complexity of calculating p(ifc|wr) in total F 
times. Therefore, we are able to conclude that calculating one estimate has a computational complexity of 0(F' T '). 

Appendix IV 
Distortion Calculation 

In this section, which again uses the same definitions as the previous appendices, we shall show that the overall 
distortion associated with each source k G M can be described by the sum 

E{(U k - U k ) 2 } = E{(Uk - Ukf} + E{(U k - Uk) 2 } (34) 

where E{(Uk — Uk) 2 } is the distortion caused by the quantization stage and E{(Uk — Uk) 2 } is the distortion caused 
by the index assignment stage. 

To do so, we start with the definition of the expectation value and obtain 

E{(Uk-Uk) 2 }= J2 P(^r)E{(Uk-U k ) 2 \w T }= J2 P(^)E{(U k - U k f\W} 

where the equality (a) holds due to the fact that the index assignments m n are surjective functions for all n G T 
and, thus, the summation over all wr G Wt covers the same observation space as the summation over all ir £ It- 
Based on this observation, equation ( 1341 ) can easily be established by showing that 

E{(U k - U k ) 2 \ir} = E{(U k - U k ) 2 \ir} + E{(U k - U k ) 2 \W}- (35) 

The definition of the conditional expectation allows us to rewrite 

2 ( a ) r+°° 
E{(U k - U k ) 2 \ir} = / (uk(w r ) ~ Uk) 2 p(uk\ir)du k 

J U i, — — oo 



(b) 



(Mfc(w-r) -u k ) p(u k \ik)du k 

Uf e = — oo 



(c) 1 f b k(ik+V 

= ~t~. r / (wfc(wr) - u k ) p(u k )du k (36) 

PW Ju k =b k (i k ) 

where the definition of the conditional expectation is used in (a) together with the fact that w n = m n (i„) for all 
n G T, equality (b) is valid since ik is known if ir is known and, thus, p(uk\iT) = p(uk\ik) and equality (c) holds 
due to 129b , Assuming that 

i.e. that the reconstruction value of the quantizer is the centroid of the quantization region, it is possible to show that 
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the required integration can be split into two parto such that 



I (ufc(wr) - Uk) p(u k )du k = 

u k = b k(ik) 




Uk= b k(ik) 



{u k , 



Uk) p(u k )duk 



(38) 



+(w fe (w T ) - u k ,i k ) 2 p{uk,i k ), 



where p(uk.i k ) =p(*fe)- Plugging 038b into l !36l l. we obtain 



E{(U k -Uk) 2 \iT} 




(39) 



directly establishing {35} and, thus. 



the desired result in J34b . 



Appendix V 



Complexity of the Index-Reuse Optimization 



The computational complexity of the index-reuse algorithm presented in Algorithm [T] can be bounded by evaluating 
how often, in the worst-case, the operations within the innermost of the nested loops have to be performed. 

The outermost loop is executed for each of the L — K mergings that have to be performed to obtain mappings 
with K output codewords from the initial mapping with L output codewords. The second loop is executed for all 
considered encoders n £ fi, i.e. in total |fi| times. Finally, the innermost loop runs through all possibilities of choosing 
2 out of k codewords for the merging, i.e. we have Q) = 2 ](k-2)'. = i(^ 2 — ^0 possibilities. In the worst-case, i.e. 
in the initial case where k — L, we obtain \{L 2 — L) possibilities. Thus, the operations within the innermost loop 
have to be performed — K)\{L 2 — L) = ||fi|(L - K)L 2 — §|fi|(£ - K)L times in the worst-case. 

Now, to determine the overall complexity of the algorithm the complexities of the merging operation e„ — 
g(fn, a, b), the complexity of the distortion calculation d = dd(^ , £n) and the test if d < d* have to be determined. 
Assuming that the merging and the test can be performed with a constant computational complexity of 0(1), it remains 
to determine the complexity of calculating dd{^) given the current set of mapping functions £ n . This calculation 
requires the calculation of dd(n) according to ([§) which, in turn, requires the calculation of the estimate i„(wn) 
according to <[3j for all L'*' possible realizations of 6 Assuming that n G fl, i.e. that = |fi|, the result 

of Appendix |IIII directly applies here and we can state that calculating one estimate has a computational complexity 
of ©(f 1 ! ')^ Thus, in the most straightforward implementation, the computational complexity of calculating dj(n) 
according to © is of 0((£F) |n| ). 

Based on the presented results and after substituting F by L — K + 1, as derived in Appendix [TT] for surjective 
mapping functions, we are able to conclude that the overall computational complexity of the index-reuse algorithm 
is of 0{\D.\L M+2 (L - A') |n| + 1 ) showing an exponential growth with 

10 This can be achieved by substituting Ufc(w^r) = Mfe,i fc +d k , where d k = %(wt) ~^k,i k < such that (k^wt) — Uk) 2 can be expressed 
as (uk,i k + d k - u k ) 2 which derives to (uk,i k - u k ) 2 + 2d k {uk,i k ~ u k) + d\. 

"There are more efficient ways to calculate d d (n) based on intermediate results. However, due to lack of space, the discussion is neglected 
here. 

12 The results for the case where n ^ Q can be derived accordingly. 
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Appendix VI 

Efficient Sub-Optimal Decoding and its Complexity 

In this section, which uses the same definitions as the previous appendices, we shall elaborate on how a source 
factorization based on CCREs according to l llOt and i ll It can be used for efficient decoding. This shall be achieved 
by assuming that the factorization J 1 01 > also holds for the discrete cas such that 

M M 

P« = II = II PCUJia™), (4°) 

m— 1 m — 1 

a fact, which can be exploited for scalable decoding as shown in [2] for a similar system setup. Since the decoder 
design considered in this work follows the same principles, we shall focus on the differences resulting from system 
specific properties. 

In Section |n| we have shown that the calculation of the optimal estimate u„(w) according to {3} for n £ A/ 
requires the calculation of the probabilities 

p(i n = Z|w) = 7 • f( w > } ) = 7 • (41) 

Vi6I:i„=! VieQ(w):i„=Z 

where the equality (a) is due to the result derived in Appendix [II] Replacing p(i) by its approximation p(i) as given 
by the factorization in J40t . we obtain the following approximation 

p{i n = i| W ) = 7 • e pw=7- e (n ^ 

ViGQ(w):i„=i ViGQ(w):i„=i \m=l / 

which can be calculated efficiently for all I £ T n and for all n £ TV by running the sum-product algorithm on the 
factor graph representation of the factorization in i40\ . For a general treatment of factor graphs and the sum-product 
algorithm please refer to [14] or to [2] where a similar system setup is discussed. 

In order to provide the fundamentals, we include a brief review here. A factor graph is a bipartite graph that 
consists of variable and function nodes and expresses how a (global) function factors into (local) functions. The 
variable nodes represent the arguments of the functions and the function nodes the (local) functions itself. The sum- 
product algorithm allows us to perform the (global) marginalization in d42t based on (local) marginalizations of the 
following type 

A»m-.n(0 = E II Ms^m (43) 

which are performed in a structured way for all n £ <S m and m £ M. Following the intuition in [14], the results 
of the marginalizations in j43\ for I — 0, 1, |T„| — 1 can be seen as messages represented a vector p, m ^ n = 
(/i m ^„(0), /i m _»„(l), . . . , (i m ^ n (\l n \ — 1)) that are sent from the function node m £ M. to the variable node n £ M 
for further processing. Similarly, the inputs of the marginalizations in {43} can also be seen as messages /Lt 9 ^ m that 
were received at the function node m originating from some variable nodes g £ N ■ Those messages represent the 
product 

Ms^m(fc) = Yl Vh^ g {k) (44) 

VheM:g£S h ,h^m 

13 This assumption is plausible since U n — > I„ forms a Markov chain for n = 1, 2, , . , , N, 
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for k = 0, 1, \T g \ — 1 and, thus, \i g ^ m = (fi g ^ m (0), n g ~, m (l), ■ ■ ■ , fi g ^ m {\I g \ - 1))- Using this abstraction, the 
techniques described in [14] can be directly applied here giving rise to an efficient calculation of d42t[^l In particular 
this is achieved by running an appropriate message passing algorithrJl along the factor graph representation of d40t 
and depending on the fact if the message passing procedure terminates or not, i.e. if the factor graph is cycle-free or 
notQ the exact or an approximated value of p(i„ = Z|w) is obtained simultaneously for all I G T„ and n G A/". 

Since the presented decoding scheme is based on message passing, its overall complexity can be analyzed by 
considering all messages that are created during the decoding process and jointly evaluating their complexity. We 
notice that the calculation of the messages at each function node m £ M according to d43l > requires a marginalization 
of the same type as discussed in Appendix [TT1 Assuming that furthermore L n = L and K n = K for n = 1, 2, . . . , N 
and that the complexity of elementary operations are the same as stated in Appendix Ull the derived results directly 
apply here and we are able to conclude that the messages at the function nodes m G M can be created with a 
computational complexity of 0(F^ S "^). Considering the messages created at the variable nodes n G Af according 
to ( 144b . it is easy to see that the messages can be derived with a computational complexity of O(L). We notice that 
the complexity of calculating the messages at the function nodes is higher than at the variable nodes since generally 
|<S m | > 1 for all (but maybe one) m G M, i.e. the complexity of calculating the messages at the variable nodes can 
be neglected here. In order to provide an expression for the complexity, we have to distinguish between two cases. 

In the case where the factor graph is cycle-free, the efficient forward-backward algorithm, see [14], can be used 
for message passing and only one message (in each direction) needs to be passed along each edge within the graph. 
Assuming that |<S m | < S for all m G M, the calculation in d43t has to be performed at most M ■ S times leading to 
a computational complexity of 0(MSF s ). In the case where the graph has cycles, the message passing has to be 
performed in an iterative way for an reasonable amount of iterations T >> 1, see [14], and we obtain a computational 
complexity of 0{TMSF s ). 



Appendix VII 
Complexity of Source-Optimized Clustering 

The complexity of source-optimized clustering used in Section HV-B I shall be discussed next^In [12] it is shown that 
hierarchical clustering, upon which the presented procedure is based, has a computational complexity of 0(N 2 log A^). 
However, these results do not directly hold for source-optimized clustering since for each step of the procedure, i.e. for 
each merging performed, the differential KLD benefit AD'(A' k , A[) according to H6\ has to be calculated. Looking 
at d!6t in more detail, we observe that merging cluster A' k and A\ requires the calculation of the KLD benefit 
AD(A' k U A'i, 0) according to d!4t which, in turn, requires the calculation of the determinant for the corresponding 
covariance matrix R a j_ua;- Using the general definition of determinants, it is easy to see that it can be calculated 
using Gaussian elimination. Assuming that the matrix, whose determinant has to be derived, is of size N x N, then 

14 It is worth pointing out that the expression in {43} is optimized to minimize the marginalization complexity by using knowledge about the 
received codewords. This in turn means that in this particular setup the function nodes have to be initialized and not the variable nodes as in 
conventional implementations. Specifically, the function nodes are initialized by defining the set Qs ( w S m ) using knowledge of ws m for 
m = 1, 2, . . . , M and the variable nodes are initialized with trivial messages. 

15 For factor graphs without cycles the efficient forward-backward algorithm can be employed, see [14]. 

16 Using the result in [2] this can be ensured if \B m \ = 1 for m = 1, 2, . . . , M. 

"The main goal at this point is to show that source-optimized clustering is of polynominal complexity (considering the number of sources 
N) and not to find an exact expression for the complexity of Algorithm [5] This would exceed the scope of this work. 
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the complexity of the Gaussian elimination and, thus, also of calculating the determinant is of 0(N 3 ), see e.g. r25ll 18 l 
Since |A^ U Aj| is always smaller or equal to N, the matrix R a j_ uA j is (at most) of size N x N and, thus, the 
calculation of |R<a'uA'I ls °f 0(N A ). Assuming that the complexity of performing one merging step within the 
classical hierarchical clustering algorithm is of 0(1), i.e. the minimum possible, then the number of mergings to 
be performed can be bounded by 0(N 2 log N). For the source-optimized clustering procedure this means that its 
computational complexity is of 0(N logiV). 

Appendix VIII 
Complexity of Source-Optimized Linking 

This section addresses the complexity of constructing the source-optimized factorization presented in Section HV-CI 
Using the result in [17], we are able to conclude that the directed spanning tree algorithm, upon which the presented 
procedure is based on, can be implemented with a complexity of 0(C log C). However, beside this, also the complexity 
of preprocessing the data required to initialize the directed spanning tree algorithm has to be considered. In particular 
this means that the link costs have to be determined before the directed spanning tree algorithm can be employed. 
In total there are C 2 link costs c k .i representing the KLD benefit associated with establishing a link between cluster 
Afc, k € r, and A;, I 6 T, that have to be calculated according to {25). In order to calculate this link cost all 
possible combinations of V' k € T(A,Ak) and Q[ G T(B,Ai) have to be evaluated as stated in d24t . It is easy 
to see that the number of such combinations is given by the product between \T(A, At)\ and \T(B,Ai)\ where 
\T(A,A k )\ = ('^f') and \T{B,Aj)\ = ('3''). For simplicity, we assume in the following that A = B = f and 

that I A c I = S for c= 1. 2 Ol 19 l After simple mathematical manipulation we are able to conclude that there are 

at most 2 s log2 s such combinations. It remains to derive the complexity of calculating the argument in J24b . i.e. 
the complexity of calculating AD* (V' k , Q[) according to J22t . which is clearly determined by the complexity of 
calculating AD(V' k U Q[, 0) according to d 1 41 >. Using the same arguments as in Appendix I VIII we can state that the 
complexity of calculating the determinant |R-p^ u g/ 1 in l !14b is of 0(\V' k U Qj| 3 ), i.e. it is of 0(S 3 ) using the same 
assumptions as before. Putting everything together, we are able to conclude that the calculation of all link cost is 
of 0(C 2 2 s lo ^ s S 3 ) = o(C 2 2 {3+s)1 ° g 2 s ). The computational complexity of the overall source-optimized linking 
procedure is then given by the sum of the derived complexities, i.e. of the directed spanning tree algorithm and the 
link cost calculation. Since the complexity of the algorithm can be neglected here, we conclude that the complexity 
of source-optimized linking is of 0(C 2 2^ 3+s - ) log2 s ) which is only feasible for small values of S. 
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